Corral Framework: Trustworthy and Fully Functional Data Intensive Parallel Astronomical Pipelines

نویسندگان

  • Juan B. Cabral
  • Bruno Sánchez
  • Martin Beroiz
  • Mariano Domínguez
  • Marcelo Lares
  • Sebastián Gurovich
  • Pablo M. Granitto
چکیده

Data processing pipelines are one of most common astronomical software. This kind of programs are chains of processes that transform raw data into valuable information. In this work a Python framework for astronomical pipeline generation is presented. It features a design pattern (Model-View-Controller) on top of a SQL Relational Database capable of handling custom data models, processing stages, and result communication alerts, as well as producing automatic quality and structural measurements. This pattern provides separation of concerns between the user logic and data models and the processing flow inside the pipeline, delivering for free multi processing and distributed computing capabilities. For the astronomical community this means an improvement on previous data processing pipelines, by avoiding the programmer deal with the processing flow, and parallelization issues, and by making him focusing just in the algorithms involved in the successive data transformations. This software as well as working examples of pipelines are available to the community at https://github.com/toros-astro.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallelizing XML data-streaming workflows via MapReduce

In prior work it has been shown that the design of scientific workflows can benefit from a collection-oriented modeling paradigm which views scientific workflows as pipelines of XML stream processors. In this paper, we present approaches for exploiting data parallelism in XML processing pipelines through novel compilation strategies to the Map-Reduce framework. Pipelines in our approach consist...

متن کامل

Parallelizing XML Processing Pipelines via MapReduce

We present approaches for exploiting data parallelism in XML processing pipelines through novel compilation strategies to the MapReduce framework. Pipelines in our approach consist of sequences of processing steps that consume XML-structured data and produce, often through calls to “black-box” functions, modified (i.e., updated) XML structures. Our main contributions are a set of strategies for...

متن کامل

Using Fuzzy Logic for Automatic Analysis of Astronomical Pipelines

Fundamental astronomical questions on the composition of the universe, the abundance of Earth-like planets, and the cause of the brightest explosions in the universe are being attacked by robotic telescopes costing billions of dollars and returning vast pipelines of data. The success of these programs depends on the accuracy of automated real time processing of the astronomical images. In this ...

متن کامل

OpenCluster: A Flexible Distributed Computing Framework for Astronomical Data Processing

The volume of data generated by modern astronomical telescopes is extremely large and rapidly growing. However, current high-performance data processing architectures/frameworks are not well suited for astronomers because of their limitations and programming difficulties. In this paper, we therefore present OpenCluster, an open-source distributed computing framework to support rapidly developin...

متن کامل

Data-Intensive Computing Infrastructure Systems for Unmodified Biological Data Analysis Pipelines

Biological data analysis is typically implemented using a deep pipeline that combines a wide array of tools and databases. These pipelines must scale to very large datasets, and consequently require parallel and distributed computing. It is therefore important to choose a hardware platform and underlying data management and processing systems well suited for processing large datasets. There are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1701.05566  شماره 

صفحات  -

تاریخ انتشار 2017